{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Layers and Blocks\n",
    ":label:`sec_model_construction`\n",
    "\n",
    "When we first introduced neural networks,\n",
    "we focused on linear models with a single output.\n",
    "Here, the entire model consists of just a single neuron.\n",
    "Note that a single neuron\n",
    "(i) takes some set of inputs;\n",
    "(ii) generates a corresponding (*scalar*) output;\n",
    "and (iii) has a set of associated parameters that can be updated \n",
    "to optimize some objective function of interest.\n",
    "Then, once we started thinking about networks with multiple outputs,\n",
    "we leveraged vectorized arithmetic\n",
    "to characterize an entire *layer* of neurons.\n",
    "Just like individual neurons, \n",
    "layers (i) take a set of inputs, \n",
    "(ii) generate corresponding outputs,\n",
    "and (iii) are described by a set of tunable parameters.\n",
    "When we worked through softmax regression,\n",
    "a single *layer* was itself *the model*.\n",
    "However, even when we subsequently \n",
    "introduced multilayer perceptrons,\n",
    "we could still think of the model as \n",
    "retaining this same basic structure.\n",
    "\n",
    "Interestingly, for multilayer perceptrons, \n",
    "both the *entire model* and its *constituent layers* \n",
    "share this structure. \n",
    "The (entire) model takes in raw inputs (the features),\n",
    "generates outputs (the predictions),\n",
    "and possesses parameters \n",
    "(the combined parameters from all constituent layers).\n",
    "Likewise, each individual layer ingests inputs \n",
    "(supplied by the previous layer)\n",
    "generates outputs (the inputs to the subsequent layer),\n",
    "and possesses a set of tunable parameters that are updated\n",
    "according to the signal that flows backwards \n",
    "from the subsequent layer.\n",
    "\n",
    "\n",
    "While you might think that neurons, layers, and models\n",
    "give us enough abstractions to go about our business,\n",
    "it turns out that we often find it convenient\n",
    "to speak about components that are\n",
    "larger than an individual layer\n",
    "but smaller than the entire model.\n",
    "For example, the ResNet-152 architecture,\n",
    "which is wildly popular in computer vision,\n",
    "possesses hundreds of layers.\n",
    "These layers consist of repeating patterns of *groups of layers*. Implementing such a network one layer at a time can grow tedious.\n",
    "This concern is not just hypothetical---such \n",
    "design patterns are common in practice.\n",
    "The ResNet architecture mentioned above\n",
    "won the 2015 ImageNet and COCO computer vision competitions\n",
    "for both recognition and detection :cite:`He.Zhang.Ren.ea.2016`\n",
    "and remains a go-to architecture for many vision tasks.\n",
    "Similar architectures in which layers are arranged \n",
    "in various repeating patterns \n",
    "are now ubiquitous in other domains,\n",
    "including natural language processing and speech.\n",
    "\n",
    "To implement these complex networks,\n",
    "we introduce the concept of a neural network *block*.\n",
    "A block could describe a single layer,\n",
    "a component consisting of multiple layers,\n",
    "or the entire model itself!\n",
    "From a software standpoint, a `Block` is a *class*.\n",
    "Any subclass of `Block` must define a `forward` method \n",
    "that transforms its input into output\n",
    "and must store any necessary parameters.\n",
    "Note that some Blocks do not require any parameters at all!\n",
    "Finally a `Block` must possess a `backward` method,\n",
    "for purposes of calculating gradients.\n",
    "Fortunately, due to some behind-the-scenes magic\n",
    "supplied by the `autograd` package\n",
    "(introduced in :numref:`chap_preliminaries`)\n",
    "when defining our own `Block`,\n",
    "we only need to worry about parameters\n",
    "and the `forward` function.\n",
    "\n",
    "One benefit of working with the `Block` abstraction \n",
    "is that they can be combined into larger artifacts,\n",
    "often recursively, (see illustration in :numref:`fig_blocks`).\n",
    "\n",
    "![Multiple layers are combined into blocks](https://raw.githubusercontent.com/d2l-ai/d2l-en/2885330e548958282a8dec1dca724eb0e533cfa9/img/blocks.svg)\n",
    ":label:`fig_blocks`\n",
    "\n",
    "By defining code to generate Blocks \n",
    "of arbitrary complexity on demand,\n",
    "we can write surprisingly compact code\n",
    "and still implement complex neural networks.\n",
    "\n",
    "To begin, we revisit the Blocks \n",
    "that we used to implement multilayer perceptrons\n",
    "(:numref:`sec_mlp_djl`).\n",
    "The following code generates a network\n",
    "with one fully-connected hidden layer \n",
    "with 256 units and ReLU activation,\n",
    "followed by a fully-connected *output layer*\n",
    "with 10 units (no activation function)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n",
    "\n",
    "%maven ai.djl:api:0.8.0\n",
    "%maven ai.djl:model-zoo:0.8.0\n",
    "%maven org.slf4j:slf4j-api:1.7.26\n",
    "%maven org.slf4j:slf4j-simple:1.7.26\n",
    "    \n",
    "%maven net.java.dev.jna:jna:5.3.0\n",
    "%maven ai.djl.mxnet:mxnet-engine:0.8.0\n",
    "%maven ai.djl.mxnet:mxnet-native-auto:1.7.0-backport"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ai.djl.*;\n",
    "import ai.djl.ndarray.*;\n",
    "import ai.djl.ndarray.types.*;\n",
    "import ai.djl.ndarray.index.*;\n",
    "import ai.djl.nn.*;\n",
    "import ai.djl.nn.core.*;\n",
    "import ai.djl.training.*;\n",
    "import ai.djl.training.initializer.*;\n",
    "import ai.djl.training.listener.*;\n",
    "import ai.djl.training.dataset.*;\n",
    "import ai.djl.util.*;\n",
    "import ai.djl.translate.*;\n",
    "import ai.djl.inference.Predictor;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "33"
    }
   },
   "outputs": [],
   "source": [
    "NDManager manager = NDManager.newBaseManager();\n",
    "\n",
    "int inputSize = 20;\n",
    "NDArray x = manager.randomUniform(0, 1, new Shape(2, inputSize)); // (2, 20) shape\n",
    "\n",
    "Model model = Model.newInstance(\"lin-reg\");\n",
    "\n",
    "SequentialBlock net = new SequentialBlock();\n",
    "\n",
    "net.add(Linear.builder().setUnits(256).build());\n",
    "net.add(Activation.reluBlock());\n",
    "net.add(Linear.builder().setUnits(10).build());\n",
    "net.setInitializer(new NormalInitializer());\n",
    "net.initialize(manager, DataType.FLOAT32, x.getShape());\n",
    "\n",
    "model.setBlock(net);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we use a simple Translator for processing input and output. \n",
    "A `NoopTranslator` or No Operation Translator doesn't do any processing and simply takes in an NDList\n",
    "and outputs an NDList. For more complicated models, we can define our own\n",
    "translator and do preprocessing and postprocessing on the data. \n",
    "Here we pass in `null` for the `Batchifier` as we'll define classes that'll break\n",
    "the structure the default `Batchifier` expects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "// Set Batchifier to null\n",
    "Translator translator = new NoopTranslator(null);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then pass that into a model predictor to allow inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ai.djl.inference.Predictor;\n",
    "\n",
    "NDList xList = new NDList(x);\n",
    "\n",
    "Predictor predictor = model.newPredictor(translator);\n",
    "\n",
    "((NDList) predictor.predict(xList)).singletonOrThrow();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that we have to wrap `x` in an NDList since it is an NDArray before passing it in.\n",
    "Since `predict()` returns an Object, we also have to cast the return value of `predict()` to an NDList(we know that it should be an NDList from NoopTranslator) to call its `singletonOrThrow()` function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example, we constructed\n",
    "our model by instantiating an `SequentialBlock`,\n",
    "assigning the returned object to the `net` variable.\n",
    "Next, we repeatedly call its `add()` method,\n",
    "appending layers in the order\n",
    "that they should be executed.\n",
    "In short, `SequentialBlock` defines a special kind of `AbstractBlock`\n",
    "that maintains an ordered list of constituent `AbstractBlocks`.\n",
    "The `add()` method simply facilitates\n",
    "the addition of each successive `AbstractBlock` to the list.\n",
    "Note that each layer is an instance of the `Linear` class\n",
    "which is itself a subclass of `AbstractBlock`.\n",
    "The `forward()` function is also remarkably simple:\n",
    "it chains each Block in the list together,\n",
    "passing the output of each as the input to the next.\n",
    "\n",
    "\n",
    "## A Custom Block\n",
    "\n",
    "Perhaps the easiest way to develop intuition\n",
    "about how `Block` works\n",
    "is to implement one ourselves.\n",
    "Before we implement our own custom `Block`,\n",
    "we briefly summarize the basic functionality\n",
    "that each `Block` must provide:\n",
    "\n",
    "1. Ingest input data as arguments to its `forward()` method.\n",
    "1. Generate an output by having `forward()` return a value. \n",
    "   Note that the output may have a different shape from the input.      For example, the first Dense layer in our model above ingests an      input of arbitrary dimension but returns \n",
    "   an output of dimension 256.\n",
    "1. Calculate the gradient of its output with respect to its input,      which can be accessed via its `backward()` method. \n",
    "   Typically this happens automatically.\n",
    "1. Store and provide access to those parameters necessary \n",
    "   to execute the `forward()` computation.\n",
    "1. Initialize these parameters as needed.\n",
    "\n",
    "In the following snippet,\n",
    "we code up a Block from scratch\n",
    "corresponding to a multilayer perceptron\n",
    "with one hidden layer with 256 hidden nodes, \n",
    "and a 10-dimensional output layer.\n",
    "Note that the `MLP` class below inherits the `AbstractBlock` class.\n",
    "We will rely heavily on the parent class's methods,\n",
    "as well as implement its required to be overriden methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "34"
    }
   },
   "outputs": [],
   "source": [
    "class MLP extends AbstractBlock {\n",
    "    private static final byte VERSION = 2;\n",
    "    \n",
    "    private Block flattenInput;\n",
    "    private Block hidden256;\n",
    "    private Block output10;\n",
    "    \n",
    "    // Declare a layer with model parameters. Here, we declare two fully\n",
    "    // connected layers\n",
    "    public MLP(int inputSize) {\n",
    "        super(VERSION); // Dont need to worry about this\n",
    "        \n",
    "        flattenInput = addChildBlock(\"flattenInput\", Blocks.batchFlattenBlock(inputSize));\n",
    "        hidden256 = addChildBlock(\"hidden256\", Linear.builder().setUnits(256).build());// Hidden Layer\n",
    "        output10 = addChildBlock(\"output10\", Linear.builder().setUnits(10).build()); // Output Layer\n",
    "    }\n",
    "    \n",
    "    @Override\n",
    "    // Define the forward computation of the model, that is, how to return\n",
    "    // the required model output based on the input x\n",
    "    public NDList forward(\n",
    "            ParameterStore parameterStore,\n",
    "            NDList inputs,\n",
    "            boolean training,\n",
    "            PairList<String, Object> params) {\n",
    "        NDList current = inputs;\n",
    "        current = flattenInput.forward(parameterStore, current, training);\n",
    "        current = hidden256.forward(parameterStore, current, training);\n",
    "        // We use the Activation.relu() function here\n",
    "        // Since it takes in an NDArray, we call `singletonOrThrow()`\n",
    "        // on the NDList `current` to get the NDArray and then\n",
    "        // wrap it in a new NDList to be passed \n",
    "        // to the next `forward()` call\n",
    "        current = new NDList(Activation.relu(current.singletonOrThrow()));\n",
    "        current = output10.forward(parameterStore, current, training);\n",
    "        return current;\n",
    "    }\n",
    "    \n",
    "    @Override\n",
    "    public Shape[] getOutputShapes(NDManager manager, Shape[] inputs) {\n",
    "        Shape[] current = inputs;\n",
    "        for (Block block : children.values()) {\n",
    "            current = block.getOutputShapes(manager, current);\n",
    "        }\n",
    "        return current;\n",
    "    }\n",
    "    \n",
    "    @Override\n",
    "    public void initializeChildBlocks(NDManager manager, DataType dataType, Shape... inputShapes) {\n",
    "        hidden256.initialize(manager, dataType, new Shape(1, inputSize));\n",
    "        output10.initialize(manager, dataType, new Shape(1, 256));\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To begin, let us focus on the `forward()` method.\n",
    "Note that it takes a `ParameterStore`, `NDList` input, `boolean` training, \n",
    "and `PairList<>` params as input,\n",
    "but for now you only need to care about the `NDList` input. It then\n",
    "passes the inputs through each layer \n",
    "In this MLP implementation,\n",
    "both layers are instance variables.\n",
    "To see why this is reasonable, imagine\n",
    "instantiating two MLPs, `net1` and `net2`,\n",
    "and training them on different data.\n",
    "Naturally, we would expect them\n",
    "to represent two different learned models.\n",
    "\n",
    "We instantiate the MLP's layers\n",
    "in the `initializeChildBlocks()` method\n",
    "and subsequently invoke these layers\n",
    "on each call to the `forward()` method.\n",
    "Note a few key details.\n",
    "First, our customized `initializeChildBlocks()` method \n",
    "invokes each child class's `initialize()` method,\n",
    "sparing us the pain of restating\n",
    "boilerplate code applicable to most Blocks.\n",
    "We then instantiate our two `Linear` layers,\n",
    "adding them to `hidden256` and `output10`.\n",
    "Note that unless we implement a new operator,\n",
    "we need not worry about backpropagation (the `backward` method)\n",
    "or parameter initialization (the `initialize` method).\n",
    "DJL will generate these methods automatically.\n",
    "We also don't have to call `initializeChildBlocks()` and instead simply call\n",
    "`AbstractBlock`'s `initialize()` method as `AbstractBlock` automatically calls \n",
    "`initializeChildBlocks()` in it along with a few other things we\n",
    "don't need to worry about for now.\n",
    "Let us try this out:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "35"
    }
   },
   "outputs": [],
   "source": [
    "MLP net = new MLP(inputSize);\n",
    "net.setInitializer(new NormalInitializer());\n",
    "net.initialize(manager, DataType.FLOAT32, x.getShape());\n",
    "\n",
    "model.setBlock(net);\n",
    "\n",
    "Predictor predictor = model.newPredictor(translator);\n",
    "\n",
    "((NDList) predictor.predict(xList)).singletonOrThrow();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A key virtue of the `Block` abstraction is its versatility.\n",
    "We can subclass `AbstractBlock` to create layers\n",
    "(such as the `Linear` block provided by DJL),\n",
    "entire models (such as the `MLP` above),\n",
    "or various components of intermediate complexity.\n",
    "We exploit this versatility\n",
    "throughout the following chapters,\n",
    "especially when addressing \n",
    "convolutional neural networks.\n",
    "\n",
    "\n",
    "## The Sequential Block\n",
    "\n",
    "We can now take a closer look \n",
    "at how the `SequentialBlock` class works.\n",
    "Recall that `SequentialBlock` was designed \n",
    "to daisy-chain other Blocks together.\n",
    "To build our own simplified `MySequential`,\n",
    "we just need to define two key methods:\n",
    "1. An `add()` method for appending Blocks one by one to a list.\n",
    "2. A `forward()` method to pass an input through the chain of Blocks\n",
    "(in the same order as they were appended).\n",
    "\n",
    "Additional helper methods we need to define are:\n",
    "1. An `initializeChildBlocks()` method for child block initialization.\n",
    "2. A `getOutputShapes()` method for return the output shape.\n",
    "\n",
    "The following `MySequential` class delivers the same \n",
    "functionality as DJL's default `SequentialBlock` class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "36"
    }
   },
   "outputs": [],
   "source": [
    "class MySequential extends AbstractBlock {\n",
    "    private static final byte VERSION = 2;\n",
    "    \n",
    "    public MySequential() {\n",
    "        super(VERSION);\n",
    "    }\n",
    "    \n",
    "    public MySequential add(Block block) {\n",
    "        // Here, block is an instance of a Block subclass, and we assume it has\n",
    "        // a unique name. We add the child block to the children BlockList\n",
    "        // with `addChildBlock()` which is defined in AbstractBlock.\n",
    "        if (block != null) {\n",
    "            addChildBlock(block.getClass().getSimpleName(), block);\n",
    "        }\n",
    "        return this;\n",
    "    }\n",
    "    \n",
    "    @Override\n",
    "    public NDList forward(\n",
    "            ParameterStore parameterStore,\n",
    "            NDList inputs,\n",
    "            boolean training,\n",
    "            PairList<String, Object> params) {\n",
    "        NDList current = inputs;\n",
    "        for (Block block : children.values()) {\n",
    "            // BlockList guarantees that members will be traversed in the order\n",
    "            // they were added\n",
    "            current = block.forward(parameterStore, current, training);\n",
    "        }\n",
    "        return current;\n",
    "    }\n",
    "    \n",
    "    @Override\n",
    "    // Initializes all child blocks\n",
    "    public void initializeChildBlocks(NDManager manager, DataType dataType, Shape... inputShapes) {\n",
    "        Shape[] shapes = inputShapes;\n",
    "        for (Block child : getChildren().values()) {\n",
    "            shapes = child.initialize(manager, dataType, shapes);\n",
    "        }\n",
    "    }\n",
    "    \n",
    "    @Override\n",
    "    public Shape[] getOutputShapes(NDManager manager, Shape[] inputs) {\n",
    "        return inputs;\n",
    "    }   \n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `add()` method adds a single Block \n",
    "to the block list `children` by using the `addChildBlock()` method implemented in `AbstractBlock`. \n",
    "You might wonder why every DJL `Block` \n",
    "possesses a `children` attribute \n",
    "and why we used it rather than just \n",
    "defining a Java list ourselves.\n",
    "In short the chief advantage of `children`\n",
    "is that during our Block's parameter inititialization,\n",
    "DJL knows to look in the `children`\n",
    "list to find sub-Blocks whose \n",
    "parameters also need to be initialized.\n",
    "\n",
    "When our `MySequential` Block's `forward()` method is invoked,\n",
    "each added `Block` is executed \n",
    "in the order in which they were added.\n",
    "We can now reimplement an MLP \n",
    "using our `MySequential` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "37"
    }
   },
   "outputs": [],
   "source": [
    "MySequential net = new MySequential();\n",
    "net.add(Linear.builder().setUnits(256).build());\n",
    "net.add(Activation.reluBlock());\n",
    "net.add(Linear.builder().setUnits(10).build());\n",
    "\n",
    "net.setInitializer(new NormalInitializer());\n",
    "net.initialize(manager, DataType.FLOAT32, x.getShape());\n",
    "\n",
    "Model model = Model.newInstance(\"my-sequential\");\n",
    "model.setBlock(net);\n",
    "\n",
    "Predictor predictor = model.newPredictor(translator);\n",
    "((NDList) predictor.predict(xList)).singletonOrThrow();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that this use of `MySequential`\n",
    "is identical to the code we previously wrote \n",
    "for the DJL `SequentialBlock` class \n",
    "(as described in :numref:`sec_mlp_djl`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Executing Code in the `forward` Method\n",
    "\n",
    "The `SequentialBlock` class makes model construction easy,\n",
    "allowing us to assemble new architectures\n",
    "without having to define our own class.\n",
    "However, not all architectures are simple daisy chains.\n",
    "When greater flexibility is required,\n",
    "we will want to define our own `Block`s.\n",
    "For example, we might want to execute \n",
    "Java's control flow within the forward method.\n",
    "Moreover we might want to perform\n",
    "arbitrary mathematical operations,\n",
    "not simply relying on predefined neural network layers.\n",
    "\n",
    "You might have noticed that until now,\n",
    "all of the operations in our networks\n",
    "have acted upon our network's activations\n",
    "and its parameters. \n",
    "Sometimes, however, we might want to \n",
    "incorporate terms \n",
    "that are neither the result of previous layers\n",
    "nor updatable parameters. \n",
    "In DJL, we call these *constant* parameters. \n",
    "Say for example that we want a layer\n",
    "that calculates the function \n",
    "$f(\\mathbf{x},\\mathbf{w}) = c \\cdot \\mathbf{w}^\\top \\mathbf{x}$,\n",
    "where $\\mathbf{x}$ is the input, $\\mathbf{w}$ is our parameter,\n",
    "and $c$ is some specified constant \n",
    "that is not updated during optimization.\n",
    "\n",
    "In the following code, we will implement a model\n",
    "that could not easily be assembled\n",
    "using only predefined layers and `SequentialBlock`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "38"
    }
   },
   "outputs": [],
   "source": [
    "class FixedHiddenMLP extends AbstractBlock {\n",
    "    private static final byte VERSION = 2;\n",
    "    private Block hidden20;\n",
    "    private NDArray constantParamWeight;\n",
    "    private NDArray constantParamBias;\n",
    "    \n",
    "    public FixedHiddenMLP() {\n",
    "        super(VERSION);\n",
    "        hidden20 = addChildBlock(\"denseLayer\", Linear.builder().setUnits(20).build());\n",
    "    }\n",
    "    \n",
    "    @Override\n",
    "    public NDList forward(\n",
    "            ParameterStore parameterStore,\n",
    "            NDList inputs,\n",
    "            boolean training,\n",
    "            PairList<String, Object> params) {\n",
    "        NDList current = inputs;\n",
    "\n",
    "        // Fully connected layer\n",
    "        current = hidden20.forward(parameterStore, current, training);\n",
    "        // Use the constant parameters NDArray\n",
    "        // Call the NDArray internal method `linear()` to do calculation\n",
    "        current = Linear.linear(current.singletonOrThrow(), constantParamWeight, constantParamBias);\n",
    "        // Relu Activation\n",
    "        current = new NDList(Activation.relu(current.singletonOrThrow()));\n",
    "        // Reuse the fully connected layer. This is equivalent to sharing\n",
    "        // parameters with two fully connected layers\n",
    "        current = hidden20.forward(parameterStore, current, training);\n",
    "        \n",
    "        // Here in Control flow, we return the scalar\n",
    "        // for comparison\n",
    "        while (current.head().abs().sum().getFloat() > 1) {\n",
    "            current.head().divi(2);\n",
    "        }\n",
    "        return new NDList(current.head().abs().sum());\n",
    "    }\n",
    "    \n",
    "    @Override\n",
    "    public void initializeChildBlocks(NDManager manager, DataType dataType, Shape... inputShapes) {\n",
    "        Shape[] shapes = inputShapes;\n",
    "        for (Block child : getChildren().values()) {\n",
    "            shapes = child.initialize(manager, dataType, shapes);\n",
    "        }\n",
    "        // Initialize constant parameter layer\n",
    "        constantParamWeight = manager.randomUniform(-0.07f, 0.07f, new Shape(20, 20));\n",
    "        constantParamBias = manager.zeros(new Shape(20));\n",
    "    }\n",
    "\n",
    "    @Override\n",
    "    public Shape[] getOutputShapes(NDManager manager, Shape[] inputs) {\n",
    "        return new Shape[]{new Shape(1)}; // we return a scalar so the shape is 1\n",
    "    } \n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this `FixedHiddenMLP` model,\n",
    "we implement a hidden layer whose weights are initialized randomly\n",
    "at instantiation and are thereafter constant. \n",
    "This weight is not a model parameter\n",
    "and thus it is never updated by backpropagation.\n",
    "The network then passes the output of this *fixed* layer\n",
    "through a `Linear` layer. \n",
    "\n",
    "Note that before returning output,\n",
    "our model did something unusual.\n",
    "We ran a `while` loop, testing \n",
    "on the condition `np.abs(x).sum() > 1`,\n",
    "and dividing our output vector by $2$ \n",
    "until it satisfied the condition.\n",
    "Finally, we returned the sum of the entries in `x`.\n",
    "To our knowledge, no standard neural network\n",
    "performs this operation.\n",
    "Note that this particular operation may not be useful\n",
    "in any real world task. \n",
    "Our point is only to show you how to integrate\n",
    "arbitrary code into the flow of your \n",
    "neural network computations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xList"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "39"
    }
   },
   "outputs": [],
   "source": [
    "FixedHiddenMLP net = new FixedHiddenMLP();\n",
    "\n",
    "net.setInitializer(new NormalInitializer());\n",
    "net.initialize(manager, DataType.FLOAT32, x.getShape());\n",
    "\n",
    "Model model = Model.newInstance(\"fixed-mlp\");\n",
    "model.setBlock(net);\n",
    "\n",
    "Predictor predictor = model.newPredictor(translator);\n",
    "((NDList) predictor.predict(xList)).singletonOrThrow();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With DJL, we can mix and match various \n",
    "ways of assembling `Block`s together.\n",
    "In the following example, we nest `Block`s\n",
    "in some creative ways."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "40"
    }
   },
   "outputs": [],
   "source": [
    "class NestMLP extends AbstractBlock {\n",
    "    private static final byte VERSION = 2;\n",
    "    private SequentialBlock net;\n",
    "    private Block dense;\n",
    "    \n",
    "    private Block test;\n",
    "    public NestMLP() {\n",
    "        super(VERSION);\n",
    "        net = new SequentialBlock();\n",
    "        net.add(Linear.builder().setUnits(64).build());\n",
    "        net.add(Activation.reluBlock());\n",
    "        net.add(Linear.builder().setUnits(32).build());\n",
    "        net.add(Activation.reluBlock());\n",
    "        addChildBlock(\"net\", net);\n",
    "        \n",
    "        dense = addChildBlock(\"dense\", Linear.builder().setUnits(16).build());\n",
    "    }\n",
    "    \n",
    "    @Override\n",
    "    public NDList forward(\n",
    "            ParameterStore parameterStore,\n",
    "            NDList inputs,\n",
    "            boolean training,\n",
    "            PairList<String, Object> params) {\n",
    "        NDList current = inputs;\n",
    "\n",
    "        // Fully connected layer\n",
    "        current = net.forward(parameterStore, current, training);\n",
    "        current = dense.forward(parameterStore, current, training);\n",
    "        current = new NDList(Activation.relu(current.singletonOrThrow()));\n",
    "        return current;\n",
    "    }\n",
    "    \n",
    "    @Override\n",
    "    public Shape[] getOutputShapes(NDManager manager, Shape[] inputs) {\n",
    "        Shape[] current = inputs;\n",
    "        for (Block block : children.values()) {\n",
    "            current = block.getOutputShapes(manager, current);\n",
    "        }\n",
    "        return current;\n",
    "    } \n",
    "    \n",
    "    @Override\n",
    "    public void initializeChildBlocks(NDManager manager, DataType dataType, Shape... inputShapes) {\n",
    "        Shape[] shapes = inputShapes;\n",
    "        for (Block child : getChildren().values()) {\n",
    "            shapes = child.initialize(manager, dataType, shapes);\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "SequentialBlock chimera = new SequentialBlock();\n",
    "\n",
    "chimera.add(new NestMLP());\n",
    "chimera.add(Linear.builder().setUnits(20).build());\n",
    "chimera.add(new FixedHiddenMLP());\n",
    "\n",
    "chimera.setInitializer(new NormalInitializer());\n",
    "chimera.initialize(manager, DataType.FLOAT32, x.getShape());\n",
    "Model model = Model.newInstance(\"chimera\");\n",
    "model.setBlock(chimera);\n",
    "\n",
    "Predictor predictor = model.newPredictor(translator);\n",
    "((NDList) predictor.predict(xList)).singletonOrThrow();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "* Layers are Blocks.\n",
    "* A Block can contain many layers.\n",
    "* A Block can contain many Blocks.\n",
    "* A Block can contain code.\n",
    "* Blocks take care of lots of housekeeping, including parameter initialization and backpropagation.\n",
    "* Sequential concatenations of layers and blocks are handled by the `SequentialBlock` Block.\n",
    "\n",
    "\n",
    "## Exercises\n",
    "\n",
    "1. Implement a block that takes two blocks as an argument, say `net1` and `net2` and returns the concatenated output of both networks in the forward pass (this is also called a parallel block).\n",
    "1. Assume that you want to concatenate multiple instances of the same network. Implement a factory function that generates multiple instances of the same block and build a larger network from it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Java",
   "language": "java",
   "name": "java"
  },
  "language_info": {
   "codemirror_mode": "java",
   "file_extension": ".jshell",
   "mimetype": "text/x-java-source",
   "name": "Java",
   "pygments_lexer": "java",
   "version": "11.0.5+10-LTS"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
